Domain generalization (DG) aims to train a model to perform well in unseen domains under different distributions. This paper considers a more realistic yet more challenging scenario,namely Single Domain Generalization (Single-DG), where only a single source domain is available for training. To tackle this challenge, we first try to understand when neural networks fail to generalize? We empirically ascertain a property of a model that correlates strongly with its generalization that we coin as "model sensitivity". Based on our analysis, we propose a novel strategy of Spectral Adversarial Data Augmentation (SADA) to generate augmented images targeted at the highly sensitive frequencies. Models trained with these hard-to-learn samples can effectively suppress the sensitivity in the frequency space, which leads to improved generalization performance. Extensive experiments on multiple public datasets demonstrate the superiority of our approach, which surpasses the state-of-the-art single-DG methods.
translated by 谷歌翻译
回归在估计各种临床风险或测量评分的许多医学成像应用中起着至关重要的作用。尽管已经研究了医学图像分类任务中深层神经网络的培训策略和损失功能,但回归任务的选项非常有限。关键挑战之一是,很难解释由现有流行损失函数(如平方误差或L1损失)所学的高维特征表示。在本文中,我们提出了一种新颖的回归度量损失(RM-loss),该损失通过找到标签空间等均衡的表示歧管来赋予表示空间的语义含义。对两个回归任务的实验,即冠状动脉钙评分估计和骨骼年龄评估,表明RM-LOSS优于在性能和可解释性上的现有流行回归损失。代码可在https://github.com/dial-rpi/regression-metric-loss上找到。
translated by 谷歌翻译
联合学习是一种新兴的范式,允许大规模分散学习,而无需在不同的数据所有者中共享数据,这有助于解决医学图像分析中数据隐私的关注。但是,通过现有方法对客户的标签一致性的要求很大程度上缩小了其应用程序范围。实际上,每个临床部位只能以部分或没有与其他站点重叠的某些感兴趣的器官注释某些感兴趣的器官。将这种部分标记的数据纳入统一联邦是一个未开发的问题,具有临床意义和紧迫性。这项工作通过使用新型联合多重编码U-NET(FED-MENU)方法来应对挑战,以进行多器官分割。在我们的方法中,提出了一个多编码的U-NET(菜单网络),以通过不同的编码子网络提取器官特异性功能。每个子网络都可以看作是特定风琴的专家,并为该客户培训。此外,为了鼓励不同子网络提取的特定器官特定功能具有信息性和独特性,我们通过设计辅助通用解码器(AGD)来规范菜单网络的训练。四个公共数据集上的广泛实验表明,我们的Fed-Menu方法可以使用具有优越性能的部分标记的数据集有效地获得联合学习模型,而不是由局部或集中学习方法培训的其他模型。源代码将在纸质出版时公开提供。
translated by 谷歌翻译
在已建立的网络架构中,快捷连接通常用于将较早层的输出作为稍后的图层占用。尽管快捷方式有着非凡的效力,但对机制和特征仍然存在开放的问题。例如,为什么快捷方式强大?为什么快捷方式概括好?在本文中,我们研究了一种新颖稀疏捷径拓扑的表现和概括性。首先,我们证明了这种拓扑可以赋予一个神经元宽的深网络来近似任何单变量连续功能。然后,我们呈现了一种新颖的宽度有界通用近似剂,与深度有界通用近似器相比,并将近似结果扩展到同等主管网络的系列。此外,通过泛化绑定理论,我们表明所提出的捷径拓扑概率普遍性普遍性。最后,我们通过对具有众所周知的基准测试的流行体系结构(包括Reset和Densenet)的拟议拓扑进行比较来证实我们的理论分析。我们的工作有助于提高对快捷方式作用的理解,并提出了创新神经结构的进一步的机会。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Learning feature interactions is the key to success for the large-scale CTR prediction and recommendation. In practice, handcrafted feature engineering usually requires exhaustive searching. In order to reduce the high cost of human efforts in feature engineering, researchers propose several deep neural networks (DNN)-based approaches to learn the feature interactions in an end-to-end fashion. However, existing methods either do not learn both vector-wise interactions and bit-wise interactions simultaneously, or fail to combine them in a controllable manner. In this paper, we propose a new model, xDeepInt, based on a novel network architecture called polynomial interaction network (PIN) which learns higher-order vector-wise interactions recursively. By integrating subspace-crossing mechanism, we enable xDeepInt to balance the mixture of vector-wise and bit-wise feature interactions at a bounded order. Based on the network architecture, we customize a combined optimization strategy to conduct feature selection and interaction selection. We implement the proposed model and evaluate the model performance on three real-world datasets. Our experiment results demonstrate the efficacy and effectiveness of xDeepInt over state-of-the-art models. We open-source the TensorFlow implementation of xDeepInt: https://github.com/yanyachen/xDeepInt.
translated by 谷歌翻译
In this paper, we study the problem of knowledge-intensive text-to-SQL, in which domain knowledge is necessary to parse expert questions into SQL queries over domain-specific tables. We formalize this scenario by building a new Chinese benchmark KnowSQL consisting of domain-specific questions covering various domains. We then address this problem by presenting formulaic knowledge, rather than by annotating additional data examples. More concretely, we construct a formulaic knowledge bank as a domain knowledge base and propose a framework (ReGrouP) to leverage this formulaic knowledge during parsing. Experiments using ReGrouP demonstrate a significant 28.2% improvement overall on KnowSQL.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译